[VL][Delta] Add native Delta DV reader support#12040
Conversation
|
Run Gluten Clickhouse CI on x86 |
1a16894 to
66ea460
Compare
| struct DeltaProtocolInfo { | ||
| int32_t minReaderVersion; | ||
| int32_t minWriterVersion; | ||
| std::optional<std::vector<std::string>> readerFeatures; | ||
| std::optional<std::vector<std::string>> writerFeatures; | ||
|
|
||
| /// Check if this protocol supports deletion vectors. | ||
| /// Returns true if: | ||
| /// - minReaderVersion >= 3 | ||
| /// - minWriterVersion >= 7 | ||
| /// - 'deletionVectors' is in readerFeatures | ||
| bool supportsDeletionVectors() const { | ||
| if (minReaderVersion < 3 || minWriterVersion < 7) { | ||
| return false; | ||
| } | ||
| if (!readerFeatures.has_value()) { | ||
| return false; | ||
| } | ||
| return std::find(readerFeatures->begin(), readerFeatures->end(), "deletionVectors") != readerFeatures->end(); | ||
| } | ||
| }; |
There was a problem hiding this comment.
In which case do we need this struct and the validation logics? Won't we only use native DV features when the Java side feature exists?
|
|
||
| namespace { | ||
|
|
||
| class DeltaConnectorTest : public ::testing::Test { |
There was a problem hiding this comment.
Can we also add some test cases that are more E2E? E.g., read a file with a passed-in DV, then verify whether the unwanted rows are filtered out?
| void loadSerializedDeletionVector( | ||
| std::string_view serializedPayload, | ||
| std::optional<uint64_t> expectedCardinality = std::nullopt); |
There was a problem hiding this comment.
Can we add comments explaining expectedCardinality?
|
Follow-up for the failed cpp-test-udf-test jobs: the new connector-level Delta read test was aborting because the test executable still used gtest_main, so Folly singleton registration was never completed before Velox query execution requested the Timekeeper.\n\nPushed 08353bc to make velox_delta_read_test own main(), call folly::Init, and link GTest::gtest instead of GTest::gtest_main. |
08353bc to
6be05d2
Compare
What changes are proposed in this pull request?
This PR is the second step in the split Delta deletion-vector (DV) stack, following #12001.
It adds the native Velox-side Delta DV reader layer that consumes the roaring bitmap payload facilities introduced by #12001, without adding the JVM-side Delta scan metadata handoff yet.
Main changes:
DeltaDeletionVectorReaderto load materialized Delta DV payloads usingRoaringBitmapArrayDeltaSplitReaderto validate DV protocol/statistics metadata and apply row-index filtering semanticsThis PR is intentionally native-reader only:
Those pieces will be added in follow-up split PRs.
issue #11901.
How was this patch tested?
Added focused native test coverage in:
cpp/velox/compute/delta/tests/DeltaConnectorTest.cppcpp/velox/compute/delta/tests/DeltaSplitTest.cppcpp/velox/compute/delta/tests/DeltaDeletionVectorReaderTest.cppCovered cases:
RoaringBitmapArrayValidation run:
malinjawi/incubator-gluten:mainon the combined PR2 branch: all checks passed after rerunning two infra-flaky jobsgit diff --check upstream/main...HEAD/opt/homebrew/opt/llvm@15/bin/clang-formatover changed C++ filesWas this patch authored or co-authored using generative AI tooling?
Generated-by: IBM BOB